DOCOHENT BESOME 



ED 1U2 562 TM 006 173 

AOTHOE Barcn, Joan 

TITLE An Exploration of the Implication of the 

A. T, -Bayesian Decision-Theoretic Model for 
Summative and Formative Evaluation and 
Post-Assessment Organizational Change, Research 
Report Series . 

INSTITUTION Connecticut Univ, , Storrs, Bureau of Educational 

Research and Se rvice, 
FDB DATE [Apr 77 ] 

NOTE 27p.; Paper presented at the Annual Meeting of the 

American Educational Research Association (61st, New 
York, New York, April 1977) 

EDRS PRICE MF-$0.83 HC-$2, 06 Plus Postage. 

DESCRIPTORS *Bayesian Statistics; Data Collection; ^Decision • 

Making ; ^Evaluation Methods; ^Formative Evaluation; 

♦Models ; Organizational Change ; Problem Solving ; 

Program Effectiveness ; Program Evaluation ; * Summative 

Evaluation; Values 
IDENTIFIERS Decis ion_ Theoretic; Testing; Multiattribute Utility 

Bayesian Decision Model 



ABSTRACT 

The philosophy and assumptions of the Multi-Attribute 
Utility-Bayesian Decision Theoretic model (MAUT-Bay esian model) are 
presented. The evaluator uses the MAUT-Bayesian model along with the 
knowledge of the decision-maker's, and perhaps the evaluator's own 
values to decide what data should be collected. Appropriate data are 
presented to the decision-maker, who weighs the alternatives 
suggested by the multiple aspects of the data. Using an educational 
policy question as an example, each step of this decision- making 
process is described. Implications for formative and summative 
evaluations and post-assessment organizational change are offered. 
(Author/MV) 



**:Sc***********:<s=5t***4s *************************** 

* Documents acquired by EPIC include many informal unpublished * 

* materials not available from other sources. ERIC makes every effort'"'* 

* to obtain the best copy available. Nevertheless, items of marginal'* 

* reproducibility are often encountered and this affects the quality * 

* of the microfiche and hardcopy reproductions ERIC makes available * 

* via the ERIC Document Reproduction Service (EDRS). EDRS is not * 

* responsible for the quality of the original document. Reproductions * 

* supplied by EDRS are the best that can be made from the original. ** 

************ :«c.-^t**:Cc* :^:^:^:^:^:^:0c:0c:^:^*:(c:«c********4c*:<c:Ce:Ce *********** 



EKLC 



An Exploration of the Implication of the M.A.U.T.- 
Bayesian Decision-Theoretic Model for Summative 
and Formative Evaluation and 
Post-Assessment Organizational Change 

by 

Joan Baron 
University of Connecticut 



A paper presented to the Amorican Educational Research Atjuociation 1977 
Annual Meeting, April 4-^3, 1977, New Yo-k City. 

TLe author wishes to thank Drs. Edward Iwanicki, Hobert Gable and 
Marcia Guttentag for their helpful remarks in an earlier version of 
this paper. 



An Exploration of the Implication of the M.A.U.T. - Bayesian 
Decision-Theoretic Model for Summative and Formative 
Evaluation and Post-Assessment Organizational Change 

Joan Baron 
University of Connecticut 

The major goal of this paper is to familiarize the reader with the 
Multi-Attribute Utility-Bayesian Decision Theoretic model of evaluation. The 
first part of the paper will contain an exploration of the philosophy and as- 
sumptions of the model; the second section will provide a step by step appli- 
cation; the final section will discuss its implications for formative and 
summative evaluations and post-assessment organizational change. 

Philosophy and Assumptions of M.A.U.T. - Bayesian Decision-Theoretic Model 

The role of the evaluator in the M.A.U.T. - Bayesian model is that of 
a facilitator for decision-making. The evaluator collects data and presents 
it to the decision maker who will then make a decision. Perhaps the most im- 
portant question an evaluator must answer is, "What data should be collected?" 
It is in answering this question that the M.A.U.T. - Bayesian model is most 
useful as it is derivecl from the assumption that people make decisions by 
evaluating the various entities (alternatives) on many relevant value dimensions 
(see Raiffa, 1968, pp IX-X). Generally, people have certain minimum criteria 
which must first be met. After that, the alternatives are weighed and a de- 
cision is made. In a decision to purchase one of two houses, after certain 
size and price criteria have been satisfied, houses will differ on location, 
state of repair, amount of insulation, etc. Each of these dimensions will be 
considered and a final decision will be made. People are routinely called upon 



to choose between apples and oretnges. And they do it. Returning to the 
question above regarding what data should be collected, it must be answex^ed 
that data should be collected on whatever valvj dimensions the decision maker 
considers to be important. 

If two programs are to be compared, certainly data will be collected 
on the program's effectiveness l\b in most program evaluations, this will be 
the most importemt dimension. However, many additional factors may also be 
important.^ For example, the cost of the program, the amount . of training re- 
quired, attitudinal changes of the participants, etc. may be weighed in the 
decision making process. The M.A.U.T. - Bayesir,n modsl acknowledges the multi- 
faceted complexity of decision-making and attempts to quantify the process by 
isolating the values held by the decision-maker and prioritizing them in the 
same way he or she does when making the decision. Data will then be collected 
by tna evali^ator to determine the extent jo which the program succeeds on the 
dimensions whici: are important. One may ask whether the ^valuator should in- 
ject his own valuer into hia e valuation and collect data on those. He or she 
may decide to do so., Ho^^ever, it should be recognized that the decision-maJcer 
may elect tc ignore those data if h'^/she does not value the dimension even 
after being confronted with the data.^ This may be part of the reason why 
mar;/ well-intentioned evaluations are put into a drawer and never used. The 
evaluator may have provided information on program effectiveness which is of 
little importance to the decision-maker. Furthermore, the evaluation may have 
contained no information ofl dirriensions which were important to the decision. 

It should be stated that the M.A.U.T. - Bayesian model encourages the 
use of experimental and quasi-experimental designs whenever appropriate and 
possible. Data should be collected using the principles of Campbell and 

^It will h?. clear after the second section of this paper that the additional 
data must be presented separately and not included in the matrix.) 



Stanley (1963) and Cook and Campbell (I975) . The use of control groups 

wherever feasible is stron^r'y a-^vocated, particularly when evaluating the 
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program's ef fectivenessa 

The Bayesian aspect of tne model precedes from the belief that people 

have ideas regarding the probabilities for certain events to occur, and pre- 

ferences or utilities for those consequences which are independent from the 

probabilities^ Edwards, et al. (1973) provide; the following illustrations: 

What action is wise of course depends in part on what is at 
stake* Would you not take the plane if you believed it would 
crash, and would not buy flight insursmce if you believed it 
would not. Seldom must you choose between exactly two acts, 
one appropriate to the null hypothesis and the other to its 
alternative* Many intermediate, or hedging, acts are ordin- 
arily possible; flying after buying flight insurance, and 
choosing a reasonable amount of flight insurance, are OMMI^ 
examples, (p. 214) 

The decision maker concerned with a program evaluation generally has ideas 
regarding the programs* effectiveness prior to the time that the evaluator 
arives . After the data are amassed, the original probabilities are either 
confirmed or disconf irmed- Edwards, et al. 1963 (p. 208) wrote: 



A discussion of "pseudo-experiments" in Edwards et al. (l975i PP» 143-145) 
urges the reader to De wary of using control groups which do not control. 
They urge the use of convergent validity to remedy the limitations often 
confronted in field settings where randomization is not possible and pro- 
grams change continuously. 

^"In the Bayesian approach to statistics, an attempt is made to utilize all 
available information in order to reduce the amount of uncertainty present 
in an inferential or decision-making problem* As new information is obtained, 
it is combined with any previous information to form the basis for statistical 
procedures. The formal mechanism used to combine the new information with 
the previously available information is known as Bayes' theorem; this explains 
why the term "Bayesian" is often used to describe this general approach to 
statistics... When new information is obtained, probabilities are revised in 
order that they may represent all of the available information." (Winkler, 
1972; p. 2) 
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"If it were meaningful utterly to ignore prior opinion, it might 
presumably sometimes be wise to do so; but reflection shows that 
any policy that pretends to ignore prior opinion will be accept- 
able only insofar as it is actually justified by prior opinion. 
Some policies recommended under the motif of neutrality., or using 
only the facts, may flagrantly violate even very confused prior 
opinions, and so be unacceptable." 

In a later part of their discussion, Edwards et al. points out that 

work by Hays, et al. (unpublished) (p. 212) that in reality people tend to 

disbelieve evidence which does not confirm their original beliefs. 

"Subjects are unwilling to change their diffuse initial opinions 
into sharp posterior ones, even after exposure to overwhelming 
evidence. This reluctance to extract from data as much certainty 
as they permit may be widespreaui. If so, explicit application of 
Bayes* theorem to information processing tasks now performed by 
unaided human judgment may produce more efficient use of the 
available information." 

It is for the above reasons that an inferential system which closely 
mirrors the way in which people process discrepant data would tend to be more 
useful to the decision maker. The reader who wishes to pursue these issues 
is urged to reatd Edwards (1963) et al. in its entirety. Appendix A below 
reproduced their Figure 2, which graphically illustrates how two very dif- 
ferent prior judgments are altered by data so that the posterior curves be- 
gin to resemble each other after the data ie amassed. An understanding of 
thit; concept is essential to understanding the way in which subjective prior 
judgments are recast into posterior probabilities through the use of data. 
It will also aid the evaluator in determining how much data would be necessary 
to alter the prior probabilities. (For an application of the Bayesian approach 
in an evaluation setting, see Edwards et al . 1975f PP« 175-1770 



An Explication and Application of the M>A,U,T> - Bayesian Model 



The second goal of this paper to explicate for pr^icticing evaluators 
the decision-theoretic approach to evaluation research by applying it to a 
simulated evaluation problem. The approach as espoused by Edwards, 
GuttCfntag, and Snapper (1975) provides a methodological and statistical 
framework for using evaluation as the input for intelligent decision making. 
Evaluators frequently acknowledge the existence of the decision-theoretic 
or otherwise known as the Multi-Attribute Utility Analysis (H.A.U.T.) or 
Bayesian approach. However, due to its seeming complexity it has not been 
frequently employed by those evaluators not specially trained in it. 

For the purposes of applying the decision-theoretic model, we will 
use a hypothetical alternative program such aa those prevalent in Philadel- 
phici (e#g., Parkway-school-without-walls/storef ront-type-bchool) for cultur- 
ally and academically disadvantaged potential drop-outs. We will assume 
thai, the high school has been in existence for three years and the School 
Board is making a decision as to whether the storefront alternative should 
be allowed to continue in its present form. If not, should it be modified 
to resemble School X or disbanded with the children reentering the tradi- 
tional high school? (See Roberts (l975) Chapter 6 for descriptions of 
similar programs). 



This example uses contrived data and was presented by :3aron (1976). The 
10 methodologicaJ steps of the model were taken from Edwards, et al, 1975* 
F'urther elaborations may be found in Guttentag (l973) and Guttenta^ and 
Snapper (in press). 
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By way of preview, the essence of the decision-theoretic approach is 
to find out what the values of each primary interest group are and then 
measure the extent to which each of these values is being met by each of 
the three programs being considered. The one that does the best on che 
overall basis is the one to be chosen. The M.A.U.T. model delineates a set 
of 10 steps to follow in achieving this end. 

Step 1 . It must be determined whose utilities are to be maximized. 
That is, what are the various prime interf:st groups affected by and affecting 
the decision? In this situation, one is concerned with the values of those 
on the School Board, the teachers and administrators in the school, the par- 
ents and the students. Edwards et .al . (p. 153) claim that "everyone who has 
a stake and voice in the decision must be identified and people who can 
speak for them must be identified and induced to cooperate. 

S tep 2 . One must clarify the purpose for which the evaluation is being 
conducted, as the same objects or acts may have different values depending 
on the context and purpose. This has been identified above as the desire to 
select from among three alternatives: the storefront school ^ a modified 
altv.'rr itive program, and a traditional school programo 

Step 3 . The alternatives or entities being evaluated should be specified. 
(This is the same as step 2 in this particular situation,) 

Step 4. This is the first technical task. It requires the discovery of 
what dinensicng of value are important to the evaluation of the entities 
being decided upon. Edwards e t .al . recommends stating these as general 
dimensions eg. acquisition of reading skills, whereas Iwanicki (1976) recom- 
mends using specific behavioral objectives focusing on actual student be- 
haviors. In this study, we will attempt the latter. It is critical to 
mention that a separat e list will be drawn up by each separate group. This 



step merely lists the important values and dimensions; no attempt is made to 
judge whether a particular entity or program succeeds on that diinansion. It 
should be noted here that to the extent to which each grouw ha.^ previously 
done a needs assessment the task will be simplified. Son.e possible dimensions 
generated by three of the groups in our storefront sch/ol evaluation are 
listod in Table I. ■ 



Table I 

Some Possible Value Dimensions generated by 
Students , Teachers and Parents 

Students' Value dimensions : 

We will learn the basic skills. 
We V ill be prepared for a job. 

We vjill know how to solve problems and make decisions. 

We will be independent. 

We will feel good about ourselves. 

We v;Lii feel as though the teachers like us and care about us. 
Teachers ' Value dimensions : 

The students will stay in school instead of dropping out. 
The students will have a basic sense of self-worth, self-confidence, 
independence . 

The students will learn the basic skills of communication and computation. 
The students will have a level of career aspiration commensuate with their 
ability. 

The students will have a sense of social responsibility and dependability* 
The students will have mastered some techniques for problem sclving and 

decision-making. 
The students will show pride in the quality of their work. 

The students will have a basic appreciation of aesthetics and some meaningful 
options for leisure-time activities. 

Par ents' Value dimensions : 

The students will stay in school instead of dropping out. 

The students will he prepared for a good job. 

Tlie sttjdents will have mastered the basic skills. 

The scudents will ne dependable and responsible. 

The students will know how to make decisions and solve problems. 

The students will be independent. 

The students will be confident and feel a sense of self-worth* 



It will quickly be noticed that there 't* ... some goals which appear on all 
three lists and some which appear " oi.ly one of two. 

This step is very similar t ) what Renzulli recommends in his Front End 
Analysis. "At the end of the Front End Analysis the evaluator should be able 
to list the major concerns of each prime interest group and these concerns 
should be classified and organized according to similarities between the 
groups." The major difference between Renzulli^s approach and this one is 
that no attempt will be made to merge the different lists. Each list will 
be e valuated separately and fed back to the group which generated it. 
Iwauicki (p. 13) also acknowledged the collaborative aspect of developing an 
evaluation program. At the secondary school level he advocates, that "the 
persons responsible for planning and implementing the evaluation program 
should make every effort to involve the school staff in this process." He 
makes tto mention of the students' voice in developing the evaluation program. 

Step 5 . This step consists of ranking the dimensions in order of impor- 
tance. This ranking job can be performed either by individuals acting 
separately or in a group. According to Edwards, Guttentag and Snapper the 
preferred technique (p. 155) is to "try group process first, mostly to get 
the arguments on the table and to make it more likely that the participants 
start from a common base." Disagreements within groups at steps 5 and 6 seem 
to.be due to conflicting values and Edwards et >al . "wish to respect them as 
much as possible... For that reason, we feel that the judges who perform 
steps 5 and 6 should either be the decision maker(s) or well-chosen represen- 
tatives. Considerable- discussion, persuasion and information exchange should 
he used in .-in /iLtempt Lo reduce the disagreements as much as possible.'* They 
realize that this *'wlll seldom reduce to zero and state that one function of 
an executive is to resolve disagreements among subordinates. If no resolution 
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is possible we can only do an evaluation separately for each of the dis- 
agreeing individuals or groups, hoping that the disagreements are small 
enough to have little ot no action implications."^ 

For an example of ranking the dimensions in order of importance, refer 
to Table I under Teachers ' Value Dimensions. These were listed in order of 
importance. 

Ste£_j6. In this step, the dimensions will be ranked in order of impor- 
tance, while preserving the ratios between them. The first step is to 
assign to the least important dimension an importance weight of 10. The 
next most important dimension will be assigned a number that reflects its 
ratio of importance relative to the one below it, assigned a 10. The evalu- 
ator will continue up the list recording the group's assigned weights and 
checking each set of implied ratios as each new judgment is made. Thus, if 
a dimension is assigned a weight of 20 while the one above it is assigned a 
wei::.ht ot 80, this nieans that the dimension worth 20 is k as important as 
the one worth 80. By the time the most important dimension is assigned a 
value, there will have been revisions made to make previous judgments con- 
sistent with later ones. Revisions are very much in the spirit of the flexi- 
bility, change and openness encouraged by this process. For illustration, 
weights will be assigned in Table II to the Teachers' Value Dimensions. 



5 II 

"A special case arise when one of the dimensions such as cost is subject 
to an upper bound, i.e., there are budget constraints. In that case, 
4-10 should be dene ignoring the constrained dimension. Then benefit-to- 
cost ratios will be calculated. In the absence of budget constraints, 
cost is just another dimension of value, to be treated on the same footing 
as all other dimensions of value, entering into U^^ with a minus sign, 
like other unattractive dimensions.'' (This will make more sense later.) 
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Table II 
Teachers' Value Dimensions 

10 Aesthetics. 

15 Pride in work. (slightly more important) 

30 Problem-solving and decision making. (double price above, triple 
aesthetics) 

30 Responsibility and dependability, (same as No, 3) 

50 Level of aspiration, (5 times more important than aesthetics) 

100 Basic skills, (twice as important as level of aspiration, 10 times 
more than aesthetics) 

100 Self-worth. (same as basic skills) 

100 Keep students from dropping out. (same as basic skills and self-worth. 
Step 7 . After the value dimensions have been weighted, Edwards et . al , , 
1975, define the following "computational step which converts importance 
weights into numbers that are mathematically rather like probabilities. The 
impcjrtance weights will be summed, each weight will be divided by the sum 
and multiplied by 100. The choice of a 0-to-lOO scale is, of course, purely 
arbitrary. At this step, the consequences of including too many dimensions 
at Step 4 becomes glaringly apparent. If 100 points are to be distributed 
over a set of dimensions and some dimensions are very much more important 
than others, then the less Important dimensions will have non- trivial weights 
only if there aren't too many of them. As a rule of thumb, 8 dimensions is 
plenty and 15 is too many. Knowing this, one will want at Step 4 to discour- 
age respondents from being too finely analytical; rather gross dimensions 
will be just right. Moreover, it may occur that the list of dimensions will 
be revised later, and that revision, if it occurs, will typically consist of 
including more rather than fewer." As an illustration the weights listed in 
Table II will be elaborated in Table III where they will be summed and each 
will be divided by the surn and multiplied by 100. It can be observed that 
the ratios of importance have been preserved In this process. Aesthetics 
witli a value of 2.29 continues to be ten times less important than basic skills 
witli a value of 22.93 paralleling step 6 with 10 and 100. 
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Table. Ill 



IllusCraCion of Summing and Dividing Value Weights 



Aesthetics 


10 


10/435=2.29 


(These have 


Pride in work 


15 


15/435=3.44 


been multiplied 


Problem solving and decision making 


30 


30/435=6.89 


by 100) 


Responsibility and dependability- 


30 


30/435=6.89 


Level of aspiration 


50 


50/435=11.49 




Basic Skills 


100 


100/435=22.98 




Self-worth 


100 


100/435=22.98 




Drop out prevention 


100 


100/435=22.98 




Sum = 


435 


Sum = 100 





Step 8 . To recapitulate for a moment; A matri x can now be set up for 
each primary interest group. The values will be Listed across the top, one 
per column with the value assigned to it in step 7. The rows down the side 
represent the various alternatives to be weighed in the decision, the store- 
front school, a modification, a return to tradition school* Our next task is 
to fill in each cell of the matrix, i,e», '*to measure the location of each 
entity being evaluated on each dimension...*'^ It should be stressed that 
this matrix is subject to modification at any point in time. Groups can add 
or delete goals and/or alternatives. This would be in line with Iwanicki's 
recommendation (p. ]3) that "as the evaluation program is being developed and 
implemented the staff should have the opportunity to systematically review 
its effectiveness and make modifications where necessary/' He and the pro- 
ponents of the decision-theoretic model share the view that this refinement 
process is "essential to improved evaluation and decision making." 

The next task is to select evaluation instruments to use in collecting 
the information which will be used in each of the cells in the matrix. The 
two criteria suggested by Iwanicki would be useful here: 



When program.s do not yoL exLst and are potential new options, these judgments 
;irc' no more than educated guesses. As the program proceeds, data are 
gathered and the standard techniques of Bayesian statistics can be used to 
update the Initial guesses as data arciimul aLos . (Kdwards e t . a 1 > , 197.') 
, The de c loion-maker and/or program experLo may be helpful in recommen'uing 
iriutrumentu they have faith in. . . 
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1. The extent to which the instrument accurately measures the objec- 
tives of the program being evaluated. 

2. The convenience with which the results provided by the instrument 
can be used to make decisions about the students' achievement of 
the programs' objectives. 

However, he warns that the selection of quality instruments does not always 
insure that accurate feedback will be collected. Care must be taken to see 
that the instruments are administered properly. 

According to Edwards et.al . (p. I56) there are three classes of dimen- 
sions—purely subjective, partly subjective, and purely objective: 

The purely subjective dimensions are ptrhaps the easiest; you 
simply get an appropriate expert to estimate the. position of that 
entity on that dimension on a 0-to-lOO scale, where 0 is defined 
as the minimum plausible value on that dimension and 100 is de- 
fined as the maximum plausible value. A partly subjectiv e 
dimension is one in which the units of measurement are objective, 
but the locations of the entities must be subjectively estimated. 

A wholly objective dimension is one that can be measured rather 
objectively, in objective units, before the decision. For partly 
or wholly objective dimensions, it is necessary to have the 
estimators provide not only values for each entity to be evaluated, 
but also minimum and maximum plausible values, in the natural 
units of each dimension . (p. I56) 

According to Edwards, et.al . "The final task i.x step 8 is to convert measures 
in ihe partly subjective and wholly objective dimensions into the 0-to-lOO 
scale in which 0 is the minimum plausible and 100 is the maximum plausible. 
A linear transformation is almost always adequate for this purpose; eriors 
produced by the linear approximations to monotonic nonlinear functions are 
likely to be unimportant relative to test-retest unreliability, interrespon- 
dent differences, and the like." (p, I56) 

At the completion of step 8, all entities (alternatives) have been 
located on the relevant value dimensions and the location mefisures have been 



rescaled. Therefore, there will he a number from 0-to-lOO in e:ich cell of 
the matrix. 



Before examining a completed matrix, it might be helpful to the reader 
to go through some of the thinking that generates the numbers inside the 
cells of the matrix. Each of the teachers' values will be listed with a 
brief discussion of a possible choice^ of instrumentation and its scoring 
procedure. 

Dropout prevention : This would be a purely or.itiCLive dimension. We know 
our expected drop out rate in the traditional setting. We would calculate 
the actual drop out rate and compared the two by way of a proportion. 

Actual ^ location score Eg. 25 actual = 12 50 would be entered 
Expected 100 50 expected 100 into that cell 

in matrix. 

Sell'-worth : Here one might use multiple measures. Standardized self-concept 
tests might be used. These would have to be compared with the expected 
scores attained in the traditional setting. Then, as above, the proportion 
would be rescaled on a 0- to- 100 scale. Another powerful measure of self- 
concept would be the use of interview s with the students, teachers, and 
parents. Many examples can be found in Roberts (1975) chap. 7 but two 
examples of parent responses are: 

"My son feels very good." "My son is always talking about school." 

Basl.c skills : As in self-worth, standardized tests are one way to determine 
wherher the storefront school is succeeding in teaching basic skills. Work 
samples and teacher interviews would also be useful. Parent and student 
intr^rvlews might also be relevant. Data will be averaged and put into a 
0-to-lOO scale for inclusion in the matrix. 

Level of career aspiration : This area might need a longitudinal approach 
with close monitoring. Comparisons could be done within the group (at the 
beginning and end of the child's experience in the various schools) and 
between the groups in determining the value between 0 and 100 to put into 
the three cells of the matrix. 

.>oci. al respons ibili ty and dependability : Behavioral measures would be useful 
here. Since the students in the storefront school are out in the community 
(career opportunities, apprenticeships, working in politics), and assuming 
administrative and teaching roles within the school, it would be profitable 
to interview or send questionnaires to the adults working with the students. 
Criteria might include: tardiness, attendance record, etc. Comparisons 
might be made against a perfect performance or against the traditional school 
In filling in the three cells with numbers from 0-to-lOO. 

^Jjj^hJLem sol ving and decision mak ing: Multi-measures would again be useful 
here, i.e., standardized tests and (ni-the-job experiences. As above, these* 
would be rescaled so as to be avt^rnged and then included in the cells with 
0-to-lOO scores. 
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Pride in work ; Standardized tests would not be relevant here. We could 
look at work samples and interview the students and parents about the stu- 
dents' attitude:, toward that work. Comparisons could be made with the 
traditional setting. Eg. "Teachers care about you because if you don't do 
your work they 'lean on you. '...They go over your work with you..." 

Aesthetics and Lei sure Time ; Interviews, questionnaires, and logs might be 
useful in ascertaining whether the students in the storefront school have a 
different aesthetic appreciation and/or use of leisure time from those in 
the more traditional settings. These will be scaled from 0-to-lOO. 

At this point, it might -be useful to discuss the subject of unintended or 
unanticipated outcomes. According to Finkelstein and Pollack-Schloss (Chap. 
7 iu Roberts), the storefront-type of school is not without some costs. 
"There is a certain degree of role confusion among students, teachers, and 
administrators... There is some adult hesitancy in setting standards with 
fear on the part of everyone that the program will be misunderstood and 
terminated... This In turn produces a high degree of def ensiveness which 
inhibits programmatic self-examination and learning." (p. 83) (An even 
longer list of positive unanticipated outcomes could also be listed here.) 
These unexpected outcomes could be incorporated into the matrix if the 
various interest groups felt that they were'' important in making a determina- 
tion of whether to continue the storefront school, i.e., if they considered 
them to be important value dimensions. It should also be pointed out that 
as Edwards et .al. claim, the distinction between summative and formative is 
no longer a meaningful one when using the D-T approach. At every point in 
time, the data that can be gleaned from the matrix can be used Doth summa- 
tively and formatively. Examp] 5 of some hypothetical numbers placed into 
the cells will be found in Table IV. 

.Step_9. In this step, utilities will be calculated for each entity. 
Table IV illustrates this procedure by including two numbers in every cell 
of the matrix. The first is a number of 0 to 100 which represents the 
dtfKroe to wlilch each entity succeeds on each value dimension. (This Is the., 
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output of step 8) The second number in each cell is the product of the first 

number and the normalized importance weight of each value at the top of each 

7 8 

column. These products are then summed across each row. ' 



Table IV 



Hypothetical Numerical Values in Completed Matrix 
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Program 


Dropout 


Self- 


Basic 


Career 


Social 


Prob. 


Pride 


Aesthetics 




prevent. 


worth 


skill 


aspir . 


respon. 


Solv . 


in wk. 


& leisure 


TOTAL 




22.98 


22.98 


22.98 


11.49 


6.89 


6,89 


3.44 


2.29 




STOREFRONT 


(50) 


(85) 


(85) 


(90) 


(80) 


(85) 


(95) 


(75) 






1149 


1953.3 


1953.3 


1034.1 


551.2 


585.65 


326.8 


171.75 


7725.1 


MODIFIED 




















SChdOL 


(15) 


(30) 


(55) 


(45) 


(20) 


(40) 


(30) 


(15) 




PROGRAM 


344.7 


698.4 


1263.9 


517.05 


137.8 


206.7 


103.2 


34.35 


3297.1 


TRADITIONAL 


(0) 


(15) 


(40) 


(40) ■ 


(5) 


(20) 


(10) 


(10) 




SCHOOL 


0 


344.7 


919.2 


459.6 


34.45 


137.8 


34.4 


22.9 


1953. Of 



5*tep 10 . The final step consists of making the decision. If a single 
alternative is to be chosen, one might look at the totals at the right of 
eac'i row in Table IV and select the program which has the highest total. 



7 

Thti formula for a weighted average. U,- = Z.w.u,, Z.w,=100 

^ i J ij J J 

w. = normalized importance weight for j th dimension (output of step 7) 
u"^ .= rescaled position of ith entity of jth dimension (output of step 8) 
The u Hii ty for a given entity is proportional to the sum of tlie probabili- 
ties, eoch multiplied by the appropriate importance weight. 

8 

It should be mentioned that utility scores are often useful under non- 
experimental conditions, i.e., with no control group or randomization. 
Furthermore, Murphy (1974) has suggested that comparing utilities at dif- 
ferent times (Priors and Posteriors) may highlight ways in which the program 
is not performing as expected. Disparity between the wholly subjective 
priors and the data-based posteriors could indicate that the program should 
be modified or that additional research effects might be required. In 
particular, such analysis may show that a prr)gram should be modified to 
better meet the needs of a .specific subgroup of clients." (Edwards, 
et_.al . , p. 49) . 



9. 



Phia hypothetical illuctration doeu not illuctrate the use of Rayeaian 
utatiBtiCB. For im example of Briyetiian revioione, see Edwards, et al . 
pp. 175-177) 
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In the above example it is very obvious that we would choose to continue 
the storefront school. Its total is more chan twice that of the modified 
school's program and almost four times more than the traditional school's 
program, (i.e., 7725.1 vs. 3297.1 vs. 1953.1). But, it should also be noted 
that much more than a single highest sum could be derived from a decision- 
theoretic or M.A.U.T. analysis. One could do many subanalyses as well. If 
one wanted to know which program maximized a particular value, one need only 
look down the column which measured that value across the different programs. 
One could also use it to see where future efforts are needed. For example, 
one might try to improve the 50 under drop out prevention to more closely 
reach 1-00. It should also be noted that because different groups generate 
different matrices, it is possible for one program to be the most successful 
for one group and a different program be the most successful for a different 
interest group. The above matrix was generated from the teachers' values. 
It is conceivable, though not likely, that the traditional school's program 
might come out highest if one considers the values of a different subgroup. 
(sen Table I) 

In concluding, let us turn to Ren^ulli's remarks on the five essential 
Ingredients of a well-executed evaluation: (p. 5) 

1. To discover whether and how effectively the objectives of a 
program are being fulfilled. 

2. To discover unplanned and unexpected consequences that are re- 
sulting from particular program practices. 

3. To determine the underlying policies and related activities 
that contribute to success or failure in particular areas. 

^4. To provide continuous in-process feedback at intermediate 
stages throughout the course of a program. 

5. To suggest realistic, as well as ideal, alternative courses 
of action for program modification. 
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Implications for Summative and Formative Evaluations and Post-Assessment 
Organizational Change 

Edwards et al. (1973) prefer a planning orientation to that of summative 
and formative evaluations. However, because most evaluators are familiar with 
the distinction between the two, they will be discussed in turn. In a summa- 
tive evaluation using the M.A.U.T. - Bayesian model the decision maker will 
select the entity with the highest utility. This assumes that either the 
decision maker finds the information consistent with his prior beliefs or that 
the evaluator has collected sufficient evidence to be convincing in overwhelm- 
ing his original prior judgments. It may be necessary to collect additional 
data in order to dispel ambivalence. It may have occurred to the reader that 
the other important prerequisite for a useful evaluation is that the decision 
maker be honest concerning his goals and values. If there are hidden agendae 
and extraneous political pressures influencing the decision, these may render 
the evaluation inappropriate. These limitations are not^imitations of the 
model; they are, in contrast, real limitations exisdng^ in the world in which 
evaluations are conducted. One might ask whether the traditional mode of eval- 
uation which addresses only program effectiveness is better suited to these 
limitations. The M.A.U.T. - Bayesians think not. Therefore, it is quite im- 
portant to determine early in the evaluation process whether a real decision 
is at stake and whether the decision maker is honestly communicating his goals 
and priority values. To the extent that these criteria are violated, the eval- 
uation may become a charade. 

The goal of formative evaluation is program iniprovrment. In spirit, 
the M.A.U.T. - Bayesian model closely resembles tho model of evaluation called 
for in liosu and Cronbach, (1976, p. 18). 

"(l) Evaluation can constructively enter the picture earlier and 
can be seen aa a continuing i^art of management rather than as a 
short-term consulting contract. (2) The evaluator, instead of 
running alongside the train making notes through the windows, can 
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board the train said influence the engineer, the conductor, and 
the passengers. (3) The evaluator need not limit his concerns 
to objectives stated in advsuice; instead, he can also function 
as a naturalistic observer whose inquiries grow out of his / 
observations. (4) The evaluator should not concentrate on 
outcomes; ultimately, it may prove more profitable to study 
just what was delivered and how people interacted during the 
treatment process. (5) The evaluator should recognize (ajid 
act upon the recognition) that systems are rarely influenced 
by reports received throu^ the mail. Evaluation thus becomes 
a component of the evolving program itself, rather than disin- 
terested monitoring undertaken to provide ammunition to the 
warring factions in a political struggle. Formal reports to 
outsiders are reduced in significance and research findings 
become not conclusions, but updating of the system's picture 
of itself." (p. 18) 

The picture of the active participant evaluator drawn by Ross and 
Cronbach is perfectly consistent with the M.A.U.T. - Bayesian model. However, 
in order for the above approach to be possible the decision-fnaker must sub- 
scribe to it. To the extent to which the decision-maker is truly open to 
modifications and improvement and restructuring his program, the M.A.U.T. - 
Bayesian evaluator can be helpful. 

\t should be noted that the ways in which a program(3) can be improved 
often become evident as soon as the goals are listed. Before any data are 
collected, it may become obvious that unless a particular aspect of the program 
is modified, there is little or no chance of succeeding on a particular goal. 
Often, the weaknesses of a program will emerge when one aeka the program dir- 
ector for his prior estimates of how well the program is likely to succeed on 
each dimension. 



How to phrase the questions when determining prior probabilities is an area 
ripe for research. One must be careful to ask for the priors commensurate 
with the time at which the data is to be collected. If priors about the 
ultimate success of the program are compared with data collected at the 
beginning of the program there will be unnecessary and possible misleading 
discrepancies. Edwards, et al. (1963) have an important section on priors, 
but they do not discuss this issue* Two techniques for generating priors 
may be found in Raiffa, I968, (pp. I6I-I66) and Novick and Jackson, 1974f 
(pp. 160-166). The Raiffa approach uses fractiles and the Novick and 
Jackson approach uoes a computer-assisted interrogation concerning sample 
size determination. 
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If prior estimates ai*e either unrealistic or dishonest, or if the instruments 
chosen are insensitive, the earliest collected data will make the discrepancy 
obvious. Certainly, at that point, which is hopefully still early in the 
evaluation process, the decision-maker and/or program developer may begin to 
address the discrepancies. It is possible that the prior likelihoods were 
accurate and new measures are necessary; it is also possible that programmatic 
changes are needed. The M.A.U.T. matrix will enable the decision-makers to 
see that the program is succeeding better on certain dimensions than on others 
and the program cam be improved sequentially as the weaknesses are discovered. 
New priors will be formed from the posteriors and new data will indicate the 
extent to which the prograjc has been improved. As mentioned above, ambiva- 
lence on the part of the decision-maker may make more data collection neces- 
sary. The M.A.U.T. - Bayesian approach intuitively conforms to the way in 
which decisions are made. 



Implications for Organizational Change 

It has been felt by those using the M.A.U.T. - Bayesian model that post- 
assessment organizational change occurs as a result of the use of the model. 
Edwards, et al . (l9Y5) discusses the social psychology of the process. 

*'First, each group builds a consensus about its own values vis- 
a-vis the programs. This makes it possible to exchange informa- 
tion about the relative ordering of values between groups, so that 
discussions between groups about value differences can be quite 
explicit and quantified. .. .Second, the same evaluation data can be 
fed back to each group. The same data, in a matrix in which values, 
rank order, and/or importance wei^ts differ considerably, will 
yield very different final conclusions and decisions. Thus, a 
number of groups can, using the same data, come to very different 
conclusions about whether a program or prograius are meeting their 
goals. This then provides them with a substantive basic for dis- 
cussions with one, another»..In addition it means that decision-makers 
receive research data on issues that may be foreign to their own 
values but quite germane to the values of other groups, for example, 
persons affected by a program, (pp ITl-Hi)" 

The M.A.U.T. - Bayesian model involves the various interest groups in 
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generating values, prioritizing them, giving prior probability estimates, etc. 
These procedures force the groups to confront their values. For groups which 
have not yet done this, the process can be very illxuninating. The process of 
the evaluation may potentially open new channels of communication both within 
groups who are working to achieve consensus and between groups who strive to 
understand how the different values they bring to bear, affect the outcomes 
of the program. Furthermore, it leaves in its wake a strategy of decision- 
making which could be applied within the organization of future planning and 
development. 

Implications Summary 

The appeal of the M.A.U.T. - Bayesian model is in its flexibility and 
its fit to the way in which people actually make decisions. Accoi*ding to Ross 
and Cronbach, (l976, p. 14) the various purposes of evaluation have been de- 
fined as, "to assess needs, to guide a "go/no-go" decision, to provide support 
for a decision already made, to improve program plans and policies, to assist 
management by monitoring daily operations, to test social theories — all imply 
different criteria for excellence in evaluation and different, ofter. contra- 
dictory, research tactics." In theory, at least, the M.A.U.T. - Bayesian model 
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Keeney and Raiffa (l972) wrote of decision analysis: That "it serves as a 
learning experience for the participants. By virtue of explicitly examin- 
ing many of the difficult issues of a particular problem, their abilities 
to think systematically about complex aspects of public problems will likely 
improve. In addition, the mathematical reasoning, measurement techniques, 
arid general approach to problem solving mi^t be transferable to different 
areas of applicatiox-i^" (p. 70) 

12 

The author (1977) recently used the M.A.U.T. model as a needs assessment 
and planning model. After generating separate matrices for teachers, 
superintendents, and principals; representatives from the various groups 
met to discuss each other's values and priorities and build programs 
accordingly. It is further suggested that the various groups have a 
greater sense of ownei'ship in the resulting program after contributing 
in this way. 
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can be tailored to fit any of these purposes. Whether this will prove time 
in practice remains to be seen. This area is young and the number of evalua- 
tions whicii have used this approach is small. It is acknowledged that there 
are still many questions both theor*^tical and procedural which need refinement. 
As each new problem is isolated, the solution results in a stronger technology. 
The M.A.U.T. - Bayesian framework encourages and stimulates creative problem 
solving and decision-making for the evaluator as well as the decision-maker. 



Keeney and Raiffa (1972) wrote of the methodology of a decision analysis what 
can be directly applied to the M.A.U.T. - Bayesian model: "Although there 
clearly needs to be a great amount of significant work done... we feel the 
techniques and procedures that are currently available are sufficiently deve- 
loped to be an important aid to the decision-r laker- . .It is important to accxxra- 
ulate critical experiences with the use of those techniques on societal pro- 
blems. And if this effort is to make any sense at all, it is imperative that 
public officials and members of their staffs begin to use formal analysis on 
projects of importance to them. The difficulty of such efforts, as well as 
their possible benefits, should not be underestimated. Often the total value 
of such analyses is not immediately apparent but rather accrues over time as 
successive analyses improve in quality and relevancy and as people learn how 
to interpret and implement such efforts better. We should not become disil- 
lusioned if initial attempts are somewhat feeble; the achievement of quality 
is an evolutionary process. Thus, we believe it is important to start doing, 
documenting and critically reviewing these attempts with the spirit of learning. 
How can it be done better next time." (pp. 71-73) 
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Appendix A 



Bavksian Statistical Ini-kuenck 
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Fig. 2. Posterior distributions obtained from two normal priors 
after n normally distributed observations. 



To illMsir.ac both the 
cxlc::: to wlucFi llic piior distribution 
cin Ik irrelevant niwl liic rapid nar- 
rowing of the p'jslcrlor distribution as 
the result of a fvw normal observa- 
tions, consider Figure 2. The lop sec- 
tion of the figure shows two prior 
distributions, one with mean —9 and 
standard deviation 6 and the otiier 
with mean 3 and standard deviation 2. 
The other four sections show posterior 



distributions obtained by applying 
Bayes' theorem to these two priors 
after samples of size n are taken from 
a distribution with mean 0 and stand- 
ard deviation 2. Tlie samples arc 
artificially selected to have exactly the 
mean 0. After 9, and still more after 

1 6. ob s ervations^ these markedly dif- 
ferent prior distributions have led 
to almost indistinguishable posterior 
distributions. 



Edwards, 1963t (pp. 210-12) 
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